home *** CD-ROM | disk | FTP | other *** search
- The internals of Citadel -- not
- for the weak-hearted
-
- 1) OVERVIEW
-
- The fundamental structure of the system is very simple.
- CTDLMSG.SYS is a circular file. New messages are simply written
- around the buffer in an endless circle, overwriting old messages
- in their way. There is no other way of deleting messages, and
- text is never shuffled on disk. Messages are numbered
- consecutively and start with an FF (hex) byte. Except for this FF
- start-of-message byte, all bytes in the message file have the high
- bit set to 0. This means that in principle it is trivial to scan
- through the message file and locate message N if it exists, or
- return error. (Complexities, as usual, crop up when we try for
- efficiency...)
-
- Each room contains a list of message numbers. Each time we enter
- a new message in a room, we slide all the old message-numbers
- down a slot, and probably the oldest one falls off the bottom.
- Reading a room is just a matter looking up the messages one by one
- and printing them out. If the message has been overwritten
- already, we just ignore it.
-
- Implementing the "new message" function is also trivial in
- principle: we just keep track, for each caller in the userlog, of
- the highest-numbered message which existed on the >last< call.
- (Remember, message numbers are simply assigned sequentially each
- time a message is created. This sequence is global to the entire
- system, not local within a room.) If we ignore all message-numbers
- in the room less than this, only new messages will be printed.
- Voila! Code up the system described thus far, and you'll have a
- good approximation to Version 1. Better stop and reread
- everything to here, so you can pick out the fundamental mechanisms
- among all of Version 2's bells and whistles.
-
-
- 2) MESSAGE FORMAT ON DISK (CTDLMSG.SYS)
-
- A message consists of a sequence of character strings. Each
- string begins with a type byte indicating the meaning of the
- string and is ended with a null. All strings are printable ASCII:
- in particular, all numbers are in ASCII rather than binary. This
- is for simplicity, both in implementing the system and in
- implementing other code to work with the system. For instance, a
- database driven off Citadel archives can do wildcard matching
- without worrying about unpacking binary data such as dates first.
- To provide later downward compatability, all software should be
- written to IGNORE fields not currently defined.
-
- The type bytes currently defined are:
-
- BYTE Mnemonic Comments
-
- 0xFF Start-of-message indicator. Followed by local
- message ID# as ASCII string, null-terminated as
- always. This byte is the >only< byte which has
- the high bit set in a Citadel message.buf file.
- This field must be present in every message.
- A Author Name of originator of message.
- C Time Time message was created.
- D Date Date message was created.
- I Info Extra information about the originating system
- (not found in non-net messages) that is displayed
- in the message header.
- J Subject Subject field.
- M Message Text of message. Is last field in a message, by
- definition. Following data will be ignored.
- This field must be present in every message.
- N Name Human name for node originated on. Used on
- title line of foreign messages. Ex:
- ODD-DATA
- will produce a title message something like
- 82Nov23 from Cynbe ru Taren @ODD-DATA
- O Origin ID of node message originated on: Country code plus
- phone number of system. (Not stored for locally
- originated messages.) Ex:
- US 206 633 3282
- R Room Room of origin. Topic.
- S Source ID# Message ID # on system message was created on.
- Two 16-bit integers (high and low halves of
- full 32-bit ID#) separated by a blank. Ex:
- 0 13654
- T To Addressee. Used for private mail messages.
- Q ? Citadel-86 internal.
- P ? Citadel-86 internal.
- # Message-ID Usenet-style message-id.
- Z Route Routing code for roomsharing. See NETHACK.DOC
- for details on it.
-
- 2.1) EXAMPLE
-
- Let <FF> be a 0xFF byte, and <0> be a null (0x00) byte. Then a
- message which prints as
-
- LOGLAN> read new
-
- 82Nov04 From James Cooke Brown
- Loi uiue i Ti logla
-
- might be stored as
-
- <FF>0 3583<0>D82Nov04<0>AJames Cooke Brown<0>RLOGLAN<0>MLoi uiue i Ti logla<0>
- |--Local ID--|---Date---|-----Author---------|--Room---|-------Message--------|
-
- The date, room and author fields could be in any order. Not all
- fields are printed by default. The local ID# and Room field are
- suppressed here. An isolated system will not normally have use
- for fields beyond those used in this example.
-
- Lines are marked with C Newline (ASCII LF) characters, within the
- message text proper. Returns (ASCII CR) are also used for _hard_
- carriage returns; when encountered, the formatter will
- _immediately_ do a newline.
-
-
- 3) ROOM RECORDS (CTDLROOM.SYS)
-
- The rooms are indices into CTDLMSG.SYS, the message file. As
- noted in the overview, each is essentially an array of pointers
- into the message file. The pointers consist of a 32-bit message
- ID number together with a 16-bit psuedo-sector offset within
- CTDLMSG.SYS telling us where the message begins.
-
- Since messages are number sequentially and written circularly,
- the set of messages existing in CTDLMSG.SYS will always form a
- continuous sequence at any given time. Thus, by remembering the
- ID numbers of the oldest and newest messages in the message file,
- we can check to see if a message exists >before< going to disk,
- saving ourselves (and the disk drive) the pain of futile seeks in
- search of the nonexistent. This information is recorded in oldest
- and newest, 32 bit numbers. You'll be seeing more of these...
-
- The newest is simply incremented each time we enter a new message
- in the message files. Oldest is incremented each time we
- overwrite an FF (start-of-message) byte in the course of writing a
- new message into the files. This corresponds to dead reckoning --
- current code never checks to see that the message number of the
- message we are overwriting is what we think it is. In a garbaged
- file with extra FF bytes around, this could cause oldest to count
- too rapidly, eventually perhaps overtaking newest, at which time
- the system will look completely empty. If you suspect something
- like this is going on, just use configur.exe to rebuild accurate
- numbers.
-
- That should be enough background to tackle a full-scale room. From
- ctdl.h :
-
- struct aRoom { /* The appearance of a room: */
- unsigned rbgen; /* generation # of room */
- struct rflags rbflags; /* same bits as flags above */
- label rbname; /* name of room */
- ulong rblastNet;
- ulong rblastLocal;
- ulong rblastMessage;
- char rbfill[8];
- pathbuf rbdirname; /* user directory for this room's files */
- struct aRmsg {
- ulong rbmsgNo; /* every message gets unique# */
- int rbmsgLoc; /* sector message starts in */
- } msg[MSGSPERRM];
- } ;
-
- [Note that all components start with "rb" for roomBuf, to make
- sure we don't accidentally use an offset in the wrong structure.
- Be very careful also to get a meaningful sequence of components --
- C lets you mix and match structure components without complaint.]
-
- Rbgen handles the problem of rooms which have died and been
- reborn under another name. This will be clearer when we get to
- the log file. For now, just note that each room has a generation
- number which is bumped by one each time it is recycled.
-
- Rbflags is just a bag of flags recording the status of the room.
- The defined flags are:
-
- INUSE Is this room in use?
- PUBLIC Is this room public?
- ISDIR Is this room directory?
- PERMROOM Is this room permanent?
- UPLOAD Can files be written to this room?
- DOWNLOAD Can files be read from this room?
- SHARED Is this a shared room?
- ARCHIVE Is this room archived somewhere?
- ANON is this an anonymous room?
- INVITE is this an invitation-only room?
- NETDOWNLOAD Can files be grabbed by network callers?
- AUTONET Net all the messages entered in this room?
-
- Rbname is just an ASCII string (null-terminated, like all strings)
- giving the name of the room.
-
- Rbdirname is meaningful only in ISDIR rooms, in which case it
- gives the full pathname of the directory to window.
-
- Finally, msg is the array of pointers into the message file.
- RbmsgNo is the ID number of the message, and RbmsgLoc is the
- sector it begins in. (For NIL, we stick the value -1 in
- RbmsgLoc.)
-
-
- RoomTab is just a little index into CTDLROOM.SYS, to keep us from
- constantly pounding around on the disk looking for things. It
- basically records the name and status of each room. It is 100%
- redundant with the file itself... as all our indices are. (As
- all indices should be!) Note that RoomTab is a significant
- consumer of RAM all by itself. It is RAM well spent, but if you
- have to shave Citadel a few K to make it fit your system, cutting
- the number of rooms a bit is one try.
-
- The only field new to us in roomTab is rtlastMessage, recording
- the most recent message in the room. When we are searching for
- rooms with messages a given caller hasn't seen, we can check this
- number in RAM and avoid unprofitable disk accesses.
-
-
- 4) LOG RECORDS (CTDLLOG.SYS)
-
- This is the fun one. Get some fresh air and plug in your
- thinking cap first. (Time, space and complexity are the eternal
- software rivals. We've got however many log entries x about 500
- messages spread over up to 128 rooms to worry about, and with
- floppies disk access time is important... so perforce, we opt for
- lots of complexity to keep time and space in bounds.)
-
- To understand what is happening in the log code takes a little
- persistence. You also have to disentangle the different
- activities going on and tackle them one by one.
-
- o We want to remember some random things such as terminal screen
- size, and automatically set them up for each caller at login.
-
- o We want to be able to locate all new messages, and only new
- messages, efficiently. Messages should stay new even if it takes
- a caller a couple of calls to get around to them.
-
- o We want to remember which private rooms a given caller knows
- about, and treat them as normal rooms. This means mostly
- automatically seeking out those with new messages. (Obviously,
- we >don't< want to do this for unknown private rooms!) This has to
- be secure against the periodic recycling of rooms between calls.
-
- o We want to support private mail to a caller.
-
- o We want to provide some protection of this information (via
- passwords at login) and some assurance that messages are from who
- they purport to be from (within the system -- one shouldn't be
- able to forge messages from established users).
-
- Lifting another page from ctdl.h gives us:
-
- struct logBuffer { /* The appearance of a user: */
- uchar lbnulls; /* #nulls */
- struct lflags lbflags; /* UCMASK, LFMASK, EXPERT, AIDE, INUSE */
- uchar lbwidth; /* terminal width */
- int credit; /* Credit for long distance calls */
- label lbname; /* caller's name */
- label lbpw; /* caller's password */
- long lblimit; /* # bytes the user can download today */
- long lblast; /* last day the user logged in */
- uchar lbgen[MAXROOMS]; /* 5 bits gen, 3 bits lastvisit */
- ulong lbvisit[MAXVISIT];/* newestLo for this and 3 prev. visits */
- unsigned lbslot[MSGSPERRM]; /* for private mail */
- ulong lbId[MSGSPERRM]; /* for private mail */
- } ;
-
- Looks simple enough, doesn't it? One topic at a time:
-
- 4.1) RANDOM CONFIGURATION PARAMETERS
-
- These are in the first three fields in the record. Lbnulls gives
- the number of nulls to print after a newline, for folks who like
- simultaneous hardcopy. Or any remaining ASR33 aficionados
- calling up... Lbwidth is the caller's screen width. We format
- all messages to this width, as best we can. Lbflags is another
- bit-bag, recording uppercase-only folks, people who need a
- linefeed after a carraige-return, people who want to suppress the
- little automatic hints all through the system, and people who like
- to see the time a message was created.
-
-
- 4.2) FINDING NEW MESSAGES
-
- This is the most important. Thus, it winds up being the most
- elaborate. Conceptually, what we would like to do is mark each
- message with a bit after our caller has read it, so we can avoid
- printing it out again next call. Unfortunately this would suck
- disk space like mad and you'd have to read a lot of unprinted
- message off disk every time a user passes through.
-
- The approximation comes in doing things at the granularity of
- rooms rather than messages. Messages in a given room are "new"
- until we visit it, and "old" after we leave the room... whether
- we read any of them or not. This can actually be defended: anyone
- who passes through a room without reading the contents probably
- just isn't interested in the topic, and would just as soon not be
- dragged back every visit and forced to read them. Given that
- messages are numbered sequentially, we can simply record the most
- recent message ID# of each room as of the last time we visited it.
- With 128 rooms, this would give us (for each user) an array of 128
- longs, or 512 bytes.
-
- This is still too much -- I'd like the complete log record for a
- user to be as tiny as possible, and we have other stuff to do yet.
-
- So, we complicate a little more. We record in lbvisit[MAXVISIT]
- the most recent message ID# in the system on each of the last six
- calls or so. Now, for each room, we can just indicate which call
- we last visited the room on. This takes 3 bits per room, which we
- stash in the low three bits of lbgen[MAXROOMS]. Now we're down to
- 128 rooms x 3 bits (plus a few bytes in lbvisit[], of course),
- which is quite reasonable.
-
- Putting it all together, we can now compute whether a given room
- has new messages for our current caller without going to disk at
- all:
-
- > We get the lbgen[] entry for the room in question
- > We mask off the lower 3 bits
- > We use the result as an index into lbvisit[], getting the ID number
- of the most recent message in the system as of the last time we
- visited the room.
- > We compare this with roomTab[].rtlastMessage, which tells us
- what the most recent message in the room is currently.
-
-
- 4.3) REMEMBERING WHICH PRIVATE ROOMS TO VISIT
-
- This looks trivial at first glance -- just record one bit per
- room per caller in the log records. The problem is that rooms get
- recycled periodically, and we'd rather not run through all the log
- entries each time we do it. So we adopt a kludge which should
- work 99% of the time.
-
- As previously noted, each room has a generation number, which is
- bumped by one each time it is recycled. As not noted, this
- generation number runs from 0 -> 31 (and then wraps around and
- starts over). Thus, these numbers take 5 bits to represent. By a
- miraculous coincidence, we have exactly 5 bits left in the lbgen[]
- entries in the log records. [Anyone familiar with "capability"
- pointers will be encountering deja vu here...]
-
- When someone visits a room, we set the generation number in
- lbgen[] equal to that of the room. This flags the room as being
- available. If the room gets recycled, on our next visit the two
- generation numbers will no longer match, and the room will no
- longer be available -- just the result we're looking for.
- (Naturally, if a room is marked as PUBLIC, all this stuff is
- irrelevant.)
-
- This leaves only the problem of an accidental matchup between the
- two numbers giving someone access to a Forbidden Room. We can't
- eliminate this danger completely, but it can be reduced to
- insignificance for most purposes. (Just don't bet megabucks on
- the security of this system!) Each time someone logs in, we set
- all "wrong" generation numbers to be one less than the actual
- generation of the room. This means that the room must be recycled
- thirty-one times before an accidental matchup can be achieved. (We
- do this for all rooms, INUSE or dead, public or private, since any
- of them may be reincarnated as a Forbidden Room.)
-
- Thus, for someone to accidentally be lead to a Forbidden Room,
- they must establish an account on the system, then not call until
- some room has been recycle thirty-one to thirty-two times, which
- room must be reincarnated as a Forbidden Room, which someone must
- now call back (having not scrolled off the userlog in the mean
- time) and read new messages. The last clause is about the only
- probable one in the sequence. The danger of this is much less
- than the danger that someone will simply guess the name of the
- room outright...
-
-
- 4.4) SUPPORTING PRIVATE MAIL
-
- Can one have an elegant kludge? This must come pretty close.
-
- Private mail is sent and recieved in the Mail> room, which
- otherwise behaves pretty much as any other room. To make this
- work, we store the actual message pointers in lbslot[] and lbId[]
- in the caller's log record, and then copy them into the Mail> room
- array whenever we enter the room. This requires a little fiddling
- to get things just right. We have to update
- roomTab[MAILROOM].rtlastMessage at login to reflect the presence
- or absence of new messages, for example. And MakeMessage() has to
- be kludged to ask for the name of the recipient of the message
- whenever a message is entered in Mail>. But basically it works
- pretty well, keeping the code and user interface simple and
- regular.
-
-
- 4.5) PASSWORDS AND NAME VALIDATION
-
- LogTab[] indexes CTDLLOG.SYS, giving us a quick way of finding
- people. It is basically a chronologically sorted hash table. We
- keep a two-byte hash of the name and password of each caller in
- RAM. When someone tries to log in, we just whip through the table
- in order looking for matches on the password hash and loading the
- matching logfile entry in. Bogus hits are eliminated by the
- simple expedient of refusing to acknowledge a new user who's name
- or password hashes on top of an existing user. Computer
- chauvinism at it's best...
-
- This makes it difficult to forge messages from an existing user.
- (Fine point: nonprinting characters are converted to printing
- characters, and leading, trailing, and double blanks are deleted.)
-